PIL: Python 2 standard API.
Pillow: Pillow is a PIL library that supports Python 3 and is the preferred modern library for image manipulation in Python.
It is even required for simple image loading and saving in other Python scientific libraries such as SciPy and Matplotlib.
# check Pillow version number
import PIL
print('Pillow Version:', PIL.__version__)
Images are typically in PNG or JPEG format and can be loaded directly using the open() function on Image class.
This returns an Image object that contains the pixel data for the image as well as details about the image.
format property on the image will report the image format (e.g. JPEG).mode will report the pixel channel format (e.g. RGB or CMYK).size will report the dimensions of the image in pixels (e.g. 640 × 480). show() function will display the image using your operating systems default application.from IPython.display import display
from PIL import Image
# load the image
image = Image.open('opera_house.jpg')
# summarize some details about the image
print(image.format)
print(image.mode)
print(image.size)
# show the image
# image.show()
display(image)
With Pillow installed, you can also use the Matplotlib library to load the image and display it within a Matplotlib frame.
This can be achieved using the imread() function that loads the image as an array of pixels directly and the imshow() function that will display an array of pixels as an image.
# load and display an image with Matplotlib
from matplotlib import image
import matplotlib.pyplot as plt
# load image as pixel array
data = image.imread('opera_house.jpg')
# summarize shape of the pixel array
print(data.dtype)
print(data.shape)
# display the array of pixels as an image
plt.imshow(data)
plt.show()
The Matplotlib wrapper functions can be more effective than using Pillow directly.
You can access the pixel data from a Pillow Image. Perhaps the simplest way is to construct a NumPy array and pass in the Image object. The process can be reversed, converting a given array of pixel data into a Pillow Image object using the Image.fromarray() function. This can be useful if image data is manipulated as a NumPy array and you then want to save it later as a PNG or JPEG file.
# load image and convert to and from NumPy array
import numpy as np
# load the image
image = Image.open('opera_house.jpg')
# convert image to numpy array
data = np.asarray(image)
# summarize shape
print(data.shape)
# create Pillow image
image2 = Image.fromarray(data)
# summarize image details
print(image2.format)
print(image2.mode)
print(image2.size)
Matplotlib imread() function uses fewer lines of code than loading and converting a Pillow Image object and may be preferred.
An Image object can be saved by calling the save() function. This can be useful if you want to save an image in a different format, in which case the format argument can be specified.
# load the image
image = Image.open('opera_house.jpg')
# save as PNG format
image.save('opera_house.png', format='PNG')
# load the image again and inspect the format
image2 = Image.open('opera_house.png')
print(image2.format)
! ls -lh | grep "opera_house"
There are a number of ways to convert an image to grayscale, but Pillow provides the convert() function and the mode L will convert an image to grayscale.
# load the image
image = Image.open('opera_house.jpg')
# convert the image to grayscale
gs_image = image.convert(mode='L')
# save in jpeg format
gs_image.save('opera_house_grayscale.jpg')
# load the image again and show it
image2 = Image.open('opera_house_grayscale.jpg')
# show the image
# image2.show()
display(image2)
! ls -lh | grep "gray"
Sometimes it is desirable to thumbnail all images to have the same width or height. This can be achieved with Pillow using the thumbnail() function.
The function takes a tuple with the height and width, and the image will be resized so that the height and width of the image are equal or smaller than the specified shape.
保持原始比例一致。
# load the image
image = Image.open('opera_house.jpg')
# report the size of the image
print(image.size)
# create a thumbnail and preserve aspect ratio
image.thumbnail((100,100))
# report the size of the modified image
print(image.size)
# show the image
# image.show()
display(image)
We may not want to preserve the aspect ratio, and instead, we may want to force the pixels into a new shape. This can be achieved using the resize() function that allows you to specify the width and height in pixels and the image will be reduced or stretched to fit the new shape.
Standard resampling algorithms are used to invent or remove pixels when resizing, and you can specify a technique, although default is a bicubic resampling algorithm that suits most general applications.
# load the image
image = Image.open('opera_house.jpg')
# report the size of the image
print(image.size)
# resize image and ignore original aspect ratio
img_resized = image.resize((200,200))
# report the size of the thumbnail
print(img_resized.size)
# show the image
# img_resized.show()
display(img_resized)
# load image
image = Image.open('opera_house.jpg')
# horizontal flip
hoz_flip = image.transpose(Image.FLIP_LEFT_RIGHT)
# vertical flip
ver_flip = image.transpose(Image.FLIP_TOP_BOTTOM)
# plot all three images using matplotlib
plt.figure(figsize=(30,10))
plt.subplot(311)
plt.imshow(image)
plt.subplot(312)
plt.imshow(hoz_flip)
plt.subplot(313)
plt.imshow(ver_flip)
plt.show()
You will note that the imshow() function can plot the Image object directly without having to convert it to a NumPy array.
An image can be rotated using the rotate() function and passing in the angle for the rotation.
The function offers additional control such as whether or not to expand the dimensions of the image to fit the rotated pixel values (default is to clip to the same size), where to center the rotation of the image (default is the center), and the fill color for pixels outside of the image (default is black).
# load image
image = Image.open('opera_house.jpg')
# plot original image
plt.figure(figsize=(30,10))
plt.subplot(311)
plt.imshow(image)
# rotate 45 degrees
plt.subplot(312)
plt.imshow(image.rotate(45))
# rotate 90 degrees
plt.subplot(313)
plt.imshow(image.rotate(90))
plt.show()
You can see that in both rotations, the pixels are clipped to the original dimensions of the image and that the empty pixels are filled with black color.
An image can be cropped: that is, a piece can be cut out to create a new image, using the crop() function.
参数是(left_top_x, left_top_y, right_bot_x, right_bot_y)。
# load image
image = Image.open('opera_house.jpg')
# create a cropped image
cropped = image.crop((100, 100, 200, 200))
# show cropped image
# cropped.show()
display(cropped)
# load the image
image = Image.open('sydney_bridge.jpg')
# summarize some details about the image
print(image.format)
print(image.mode)
print(image.size)
# show the image
# image.show()
display(image)
Neural networks process inputs using small weight values, and inputs with large integer values can disrupt or slow down the learning process.
As such it is good practice to normalize the pixel values so that each pixel value has a value between 0 and 1.
# load image
image = Image.open('sydney_bridge.jpg')
pixels = np.asarray(image)
# confirm pixel range is 0-255
print('Data Type: %s' % pixels.dtype)
print('Min: %.3f, Max: %.3f' % (pixels.min(), pixels.max()))
# convert from integers to floats
pixels = pixels.astype('float32')
# normalize to the range 0-1
pixels /= 255.0
# confirm the normalization
print('Min: %.3f, Max: %.3f' % (pixels.min(), pixels.max()))
A popular data preparation technique for image data is to subtract the mean value from the pixel values. This approach is called centering.
Centering can be performed before or after normalization. Centering after normalization might be preferred, although it might be worth testing both approaches.
Centering requires that a mean pixel value be calculated prior to subtracting it from the pixel values. There are multiple ways that the mean can be calculated; for example:per
Per image.
Per minibatch of images (under stochastic gradient descent).
Per training dataset.
The mean can be calculated for all pixels in the image, referred to as a global centering, or it can be calculated for each channel in the case of color images, referred to as local centering.
Global Centering: Calculating and subtracting the mean pixel value across color channels.
Local Centering: Calculating and subtracting the mean pixel value per color channel.
Per-image global centering is common because it is trivial to implement.
# load image
image = Image.open('sydney_bridge.jpg')
pixels = np.asarray(image)
# convert from integers to floats
pixels = pixels.astype('float32')
# calculate global mean
mean = pixels.mean()
print('Mean: %.3f' % mean)
print('Min: %.3f, Max: %.3f' % (pixels.min(), pixels.max()))
# global centering of pixels
pixels = pixels - mean
# confirm it had the desired effect
mean = pixels.mean()
print('Mean: %.3f' % mean)
print('Min: %.3f, Max: %.3f' % (pixels.min(), pixels.max()))
# load image
image = Image.open('sydney_bridge.jpg')
pixels = np.asarray(image)
# convert from integers to floats
pixels = pixels.astype('float32')
# calculate per-channel means and standard deviations
means = pixels.mean(axis=(0,1), dtype='float64')
print('Means: %s' % means)
print('Mins: %s, Maxs: %s' %
(pixels.min(axis=(0,1)), pixels.max(axis=(0,1))))
# per-channel centering of pixels
pixels -= means
# confirm it had the desired effect
means = pixels.mean(axis=(0,1), dtype='float64')
print('Means: %s' % means)
print('Mins: %s, Maxs: %s' %
(pixels.min(axis=(0,1)), pixels.max(axis=(0,1))))
关于axis参数的官方解释:
Axis or axes along which the means are computed. The default is to compute the mean of the flattened array.
If this is a tuple of ints, a mean is performed over multiple axes, instead of a single axis or all the axes as before.
pixels.shape
pixels.mean(axis=0).shape
pixels.mean(axis=1).shape
pixels.mean(axis=2).shape
pixels.mean(axis=(0,1)).shape
There may be benefit in transforming the distribution of pixel values to be a standard Gaussian.
As with centering, the operation can be performed per image, per minibatch, and across the entire training dataset, and it can be performed globally across channels or locally per channel.
Standardization may be preferred to normalization and centering alone and it results in both zero-centered values and small input values, roughly in the range -3 to 3, depending on the specifics of the dataset.
For consistency of the input data, it may make more sense to standardize images per-channel using statistics calculated per minibatch or across the training dataset, if possible.
# load image
image = Image.open('sydney_bridge.jpg')
pixels = np.asarray(image)
# convert from integers to floats
pixels = pixels.astype('float32')
# calculate global mean and standard deviation
mean, std = pixels.mean(), pixels.std()
print('Mean: %.3f, Standard Deviation: %.3f' % (mean, std))
# global standardization of pixels
pixels = (pixels - mean) / std
# confirm it had the desired effect
mean, std = pixels.mean(), pixels.std()
print('Mean: %.3f, Standard Deviation: %.3f' % (mean, std))
There may be a desire to maintain the pixel values in the positive domain, perhaps so the images can be visualized or perhaps for the benefit of a chosen activation function in the model.
# load image
image = Image.open('sydney_bridge.jpg')
pixels = np.asarray(image)
# convert from integers to floats
pixels = pixels.astype('float32')
# calculate global mean and standard deviation
mean, std = pixels.mean(), pixels.std()
print('Mean: %.3f, Standard Deviation: %.3f' % (mean, std))
# global standardization of pixels
pixels = (pixels - mean) / std
# clip pixel values to [-1,1]
pixels = np.clip(pixels, -1.0, 1.0)
# shift from [-1,1] to [0,1] with 0.5 mean
pixels = (pixels + 1.0) / 2.0
# confirm it had the desired effect
mean, std = pixels.mean(), pixels.std()
print('Mean: %.3f, Standard Deviation: %.3f' % (mean, std))
print('Min: %.3f, Max: %.3f' % (pixels.min(), pixels.max()))
关于clip函数的官方说明:
Given an interval, values outside the interval are clipped to the interval edges. For example, if an interval of
[0, 1]is specified, values smaller than 0 become 0, and values larger than 1 become 1.
np.clip([1,2,3,4,5,6,7,8,9], 4, 6)
# load image
image = Image.open('sydney_bridge.jpg')
pixels = np.asarray(image)
# convert from integers to floats
pixels = pixels.astype('float32')
# calculate per-channel means and standard deviations
means = pixels.mean(axis=(0,1), dtype='float64')
stds = pixels.std(axis=(0,1), dtype='float64')
print('Means: %s, Stds: %s' % (means, stds))
# per-channel standardization of pixels
pixels = (pixels - mean) / stds
# confirm it had the desired effect
means = pixels.mean(axis=(0,1), dtype='float64')
stds = pixels.std(axis=(0,1), dtype='float64')
print('Means: %s, Stds: %s' % (means, stds))
def centering(image, standardize=True, local=False, positive=False):
# convert from integers to floats
pixels = image.astype('float32')
# calculate mean and standard deviation
axis = None
if local:
axis = (0,1)
means = pixels.mean(axis=axis, dtype='float64')
# centering of pixels
pixels = pixels - mean
# standardization of pixels
if standardize:
stds = pixels.std(axis=axis, dtype='float64')
pixels = pixels / std
# shift from [-1,1] to [0,1] with 0.5 mean
if positive:
pixels = np.clip(pixels, -1.0, 1.0)
pixels = (pixels + 1.0) / 2.0
return pixels
def normalize(image):
# convert from integers to floats
pixels = image.astype('float32')
# normalize to the range 0-1
pixels /= 255.0
return pixels
The Keras deep learning library provides a sophisticated API for loading, preparing, and augmenting image data.
The main API is the ImageDataGenerator class that combines data loading, preparation, and augmentation.
Keras provides the load_img() function for loading an image from file as a PIL image object.
# example of loading an image with the Keras API
from tensorflow.keras.preprocessing.image import load_img
# load the image
img = load_img('bondi_beach.jpg')
# report details about the image
print(type(img))
print(img.format)
print(img.mode)
print(img.size)
print(img.size)
# show the image
# img.show()
display(img)
The load_img() function provides additional arguments that may be useful when loading the image.
grayscale: allows the image to be loaded in grayscale (defaults to False).color_mode that allows the image mode or channel format to be specified (defaults to rgb).target_size that allows a tuple of (height, width) to be specified, resizing the image automatically after being loaded.Keras provides the img_to_array() function for converting a loaded image in PIL format into a NumPy array for use with deep learning models.
The API also provides the array_to_img() function that can be used for converting a NumPy array of pixel data into a PIL image.
# example of converting an image with the Keras API
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import array_to_img
# load the image
img = load_img('bondi_beach.jpg')
print(type(img))
# convert to numpy array
img_array = img_to_array(img)
print(img_array.dtype)
print(img_array.shape)
# convert back to image
img_pil = array_to_img(img_array)
print(type(img))
The Keras API also provides the save_img() function to save an image to file. The function takes the path to save the image, and the image data in NumPy array format. The file format is inferred from the filename, but can also be specified via the file_format argument.
# example of saving an image with the Keras API
from tensorflow.keras.preprocessing.image import save_img
# load image as as grayscale
img = load_img('bondi_beach.jpg', color_mode='grayscale')
# convert image to a numpy array
img_array = img_to_array(img)
# save the image with a new filename
save_img('bondi_beach_grayscale.jpg', img_array)
# load the image to confirm it was saved correctly
img = load_img('bondi_beach_grayscale.jpg')
print(type(img))
print(img.format)
print(img.mode)
print(img.size)
# img.show()
display(img)
An approach is to scale the images using a preferred scaling technique just-in-time during the training or model evaluation process. Keras supports this type of data preparation for image data via the ImageDataGenerator class and API.
! ls ~/.keras/datasets/
# load and summarize the MNIST dataset
from tensorflow.keras.datasets import mnist
# load dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
# summarize dataset shape
print('Train', train_images.shape, train_labels.shape)
print('Test', (test_images.shape, test_labels.shape))
# summarize pixel values
print('Train', train_images.min(), train_images.max(), train_images.mean(), train_images.std())
print('Test', test_images.min(), test_images.max(), test_images.mean(), test_images.std())
from matplotlib import pyplot as plt
plt.imshow(train_images[0])
plt.show()
ImageDataGenerator Class for Pixel Scaling¶The class will wrap your image dataset, then when requested, it will return images in batches to the algorithm during training, validation, or evaluation and apply the scaling operations just-in-time.
The usage of the ImageDataGenerator class is as follows.
1. Load your dataset.
2. Configure the ImageDataGenerator (e.g. construct an instance).
这些参数决定了Global/Fit的数据该怎么计算。详情参考API。
3. Calculate image statistics (e.g. call the
fit()function).4. Use the generator to fit the model (e.g. pass the instance to the
fit_generator()function).5. Use the generator to evaluate the model (e.g. pass the instance to the
evaluate_generator()function).
The ImageDataGenerator class supports a number of pixel scaling methods, as well as a range of data augmentation techniques. We will focus on the pixel scaling techniques in this chapter and leave the data augmentation methods to a later discussion.
The three main types of pixel scaling techniques supported by the ImageDataGenerator class are as follows:
Pixel Normalization: scale pixel values to the range 0-1.
Pixel Centering: scale pixel values to have a zero mean.
Pixel Standardization: scale pixel values to have a zero mean and unit variance.
Pixel standardization is supported at two levels: either per-image (called sample-wise) or per-dataset (called feature-wise).
Other pixel scaling methods are supported, such as ZCA, brightening, and more, but we will focus on these three most common methods.
If the chosen scaling method requires that statistics be calculated across the training dataset, then these statistics can be calculated and stored by calling the fit() function.
换句话说,只有当有Global数据需要计算的时候才需要
fit。Fit后的结果会保存到这个对象里面,之后每一次调用flow方法都会使用这些数据。When evaluating and selecting a model, it is common to calculate these statistics on the training dataset and then apply them to the validation and test datasets.
Once prepared, the data generator can be used to fit a neural network model by calling the flow() function to retrieve an iterator that returns batches of samples and passing it to the fit_generator() function.
# create data generator
datagen = ImageDataGenerator(args...)
# calculate scaling statistics on the training dataset
datagen.fit(trainX)
# get batch iterator
train_iterator = datagen.flow(trainX, trainy)
# fit model
model.fit_generator(train_iterator, ...)
If a validation dataset is required, a separate batch iterator can be created from the same data generator that will perform the same pixel scaling operations and use any required statistics calculated on the training dataset.
# get batch iterator for training
train_iterator = datagen.flow(trainX, trainy)
# get batch iterator for validation
val_iterator = datagen.flow(valX, valy)
# fit model
model.fit_generator(train_iterator, validation_data=val_iterator, ...)
Once fit, the model can be evaluated by creating a batch iterator for the test dataset and calling the evaluate_generator() function on the model. Again, the same pixel scaling operations will be performed and any statistics calculated on the training dataset will be used, if needed.
# get batch iterator for testing
test_iterator = datagen.flow(testX, testy)
# evaluate model loss on test dataset
loss = model.evaluate_generator(test_iterator, ...)
ImageDataGenerator¶This can be achieved by setting the rescale argument to a ratio by
which each pixel can be multiplied to achieve the desired range. In this case, the ratio is $\frac{1}{255}$ or about 0.0039.
The ImageDataGenerator does not need to be fit in this case because there are no global statistics line mean and standard deviation that need to be calculated.
We will use a batch size of 64.
This means that each of the train and test datasets of images are divided into groups of 64 images that will then be scaled when returned from the iterator.
# example of normalizing a image dataset
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# load dataset
(trainX, trainY), (testX, testY) = mnist.load_data()
# reshape dataset to have a single channel
trainX = trainX.reshape(tuple(list(trainX.shape) + [1]))
testX = testX.reshape(tuple(list(testX.shape) + [1]))
# confirm scale of pixels
print('Train min=%.3f, max=%.3f' % (trainX.min(), trainX.max()))
print('Test min=%.3f, max=%.3f' % (testX.min(), testX.max()))
# create generator (1.0/255.0 = 0.003921568627451)
datagen = ImageDataGenerator(rescale=1.0/255.0)
# Note: there is no need to fit the generator in this case
# prepare a iterators to scale images
train_iterator = datagen.flow(trainX, trainY, batch_size=64)
test_iterator = datagen.flow(testX, testY, batch_size=64)
print('Batches train=%d, test=%d' % (len(train_iterator), len(test_iterator)))
# confirm the scaling works
batchX, batchy = train_iterator.next()
print('Batch shape=%s, min=%.3f, max=%.3f' % (batchX.shape, batchX.min(), batchX.max()))
ImageDataGenerator¶之前在构建ImageDataGenerator对象的时候,用的是rescale参数,现在需要使用featurewise_center和samplewise_center参数。
Another popular pixel scaling method is to calculate the mean pixel value across the entire training dataset, then subtract it from each image.
This is called centering and has the effect of centering the distribution of pixel values on zero: that is, the mean pixel value for centered images will be zero.
The ImageDataGenerator class refers to centering that uses the mean calculated on the training dataset as feature-wise centering. It requires that the statistic is calculated on the training dataset prior to scaling.
# example of normalizing a image dataset
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# load dataset
(trainX, trainy), (testX, testy) = mnist.load_data()
# reshape dataset to have a single channel
trainX = trainX.reshape(tuple(list(trainX.shape) + [1]))
testX = testX.reshape(tuple(list(testX.shape) + [1]))
# report per-image mean
print('Means train=%.3f, test=%.3f' % (trainX.mean(), testX.mean()))
# create generator that centers pixel values
datagen = ImageDataGenerator(featurewise_center=True)
# calculate the mean on the training dataset
datagen.fit(trainX)
print('Data Generator Mean: %.3f' % datagen.mean)
It is different to calculating of the mean pixel value for each image, which Keras refers to as sample-wise centering and does not require any statistics to be calculated on the training dataset.
# create generator that centers pixel values
datagen2 = ImageDataGenerator(samplewise_center=True)
# calculate the mean on the training dataset
datagen2.fit(trainX)
print('Data Generator 2 Mean: ', datagen2.mean)
# demonstrate effect on a single batch of samples
iterator = datagen.flow(trainX, trainy, batch_size=64)
# get a batch
batchX, batchy = iterator.next()
# mean pixel value in the batch
print(batchX.shape, batchX.mean())
# demonstrate effect on entire training dataset
iterator = datagen.flow(trainX, trainy, batch_size=len(trainX), shuffle=False)
# get a batch
batchX, batchy = iterator.next()
# mean pixel value in the batch
print(batchX.shape, batchX.mean())
如果是samplewise_center,那么Batch Mean大约是0。
# demonstrate effect on a single batch of samples
iterator = datagen2.flow(trainX, trainy, batch_size=64)
# get a batch
batchX, batchy = iterator.next()
# mean pixel value in the batch
print(batchX.shape, batchX.mean())
ImageDataGenerator¶在创建ImageDataGenerator对象时,还输入featurewise_std_normalization或者samplewise_std_normalization参数。
# example of normalizing a image dataset
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# load dataset
(trainX, trainy), (testX, testy) = mnist.load_data()
# reshape dataset to have a single channel
trainX = trainX.reshape(tuple(list(trainX.shape) + [1]))
testX = testX.reshape(tuple(list(testX.shape) + [1]))
# report pixel means and standard deviations
print('Statistics train=%.3f (%.3f), test=%.3f (%.3f)' % (trainX.mean(), trainX.std(), testX.mean(), testX.std()))
# create generator that centers pixel values
datagen = ImageDataGenerator(featurewise_center=True, featurewise_std_normalization=True)
# calculate the mean on the training dataset
datagen.fit(trainX)
print('Data Generator mean=%.3f, std=%.3f' % (datagen.mean, datagen.std))
如果是Sample-wise则不会计算Global数据。
# create generator that centers pixel values
datagen2 = ImageDataGenerator(samplewise_center=True, samplewise_std_normalization=True)
# calculate the mean on the training dataset
datagen2.fit(trainX)
print('Data Generator mean=%s, std=%s' % (datagen2.mean, datagen2.std))
# demonstrate effect on entire training dataset
iterator = datagen.flow(trainX, trainy, batch_size=64, shuffle=False)
# get a batch
batchX, batchy = iterator.next()
# pixel stats in the batch
print(batchX.shape, batchX.mean(), batchX.std())
# demonstrate effect on entire training dataset
iterator = datagen.flow(trainX, trainy, batch_size=len(trainX), shuffle=False)
# get a batch
batchX, batchy = iterator.next()
# pixel stats in the batch
print(batchX.shape, batchX.mean(), batchX.std())
如果是Sample-wise,那么每一个Batch的Mean也是0。
# demonstrate effect on entire training dataset
iterator = datagen2.flow(trainX, trainy, batch_size=64, shuffle=False)
# get a batch
batchX, batchy = iterator.next()
# pixel stats in the batch
print(batchX.shape, batchX.mean(), batchX.std())
如果是3色道的图片,那么计算的结果都是Per Image(Global)的而不是Per Channel的。
zca_epsilon和zca_whitening参数用于ZCA白化方式,并且至少要Centering。
There are conventions for storing and structuring your image dataset on disk in order to make it fast and efficient to load and when training and evaluating deep learning models.
这样做的目的是不用一次性读取所有的图片到内存。
First, we have a data/ directory where we will store all of the image data. Next, we will have a data/train/ directory for the training dataset and a data/test/ for the holdout test dataset. We may also have a data/validation/ for a validation dataset during training. So far, we have:
! [ -d "data" ] && rm -r data
! mkdir data data/train data/test data/validation
! tree data
Under each of the dataset directories, we will have subdirectories, one for each class where the actual image files will be placed.
! for sub in "train" "test" "validation"; do mkdir data/${sub}/cracked data/${sub}/none; done
! tree data
A good naming convention, if you have the ability to rename files consistently, is to use some name followed by a number with zero padding, e.g. image0001.jpg if you have thousands of images for a class.
! [ -d "data" ] && rm -r data
! cp -r /Users/ray/Google\ 云端硬盘/自学/ML\ \&\ DL/Deep\ Learning\ for\ Computer\ Vision/code/chapter_08/data data
! tree data
Instead of loading all images into memory, it will load just enough images into memory for the current and perhaps the next few mini-batches when training and evaluating a deep learning model.
The pattern for using the ImageDataGenerator class is used as follows:
ImageDataGenerator class. 2. Retrieve an iterator by calling the flow_from_directory() function.
Of note is the
target_sizeargument that allows you to load all images to a specific size, which is often required when modeling. The function defaults to square images with the size(256, 256).The function also allows you to specify the type of classification task via the class mode argument, specifically whether it is
binaryor a multiclass classificationcategorical.The default
batch_sizeis 32, which means that 32 randomly selected images from across the classes in the dataset will be returned in each batch when training.You may also want to return batches in a deterministic order when evaluating a model, which you can do by setting
shuffletoFalse.The subdirectories of images, one for each class, are loaded by the
flow_from_directory()function in alphabetical order and assigned an integer for each class. For example, the subdirectory blue comes before red alphabetically, therefore the class labels are assigned the integers:blue=0,red=1. This can be changed via theclassesargument in callingflow_from_directory()when training the model.
3. Use the iterator in the training or evaluation of a model.
We can use the same ImageDataGenerator to prepare separate iterators for separate dataset directories. This is useful if we would like the same pixel scaling applied to multiple datasets (e.g. train, test, etc.).
# create generator
datagen = ImageDataGenerator(samplewise_center=True, samplewise_std_normalization=True)
# prepare an iterators for each dataset
train_it = datagen.flow_from_directory('data/train/', class_mode='binary')
val_it = datagen.flow_from_directory('data/validation/', class_mode='binary')
test_it = datagen.flow_from_directory('data/test/', class_mode='binary')
# confirm the iterator works
batchX, batchy = train_it.next()
print('Batch shape=%s, min=%.3f, max=%.3f, mean=%.3f' % (batchX.shape, batchX.min(), batchX.max(), batchX.mean()))
Ftting a model with a data generator can be achieved by calling the fit_generator() function on the model and passing the training iterator (train_it).
The validation iterator (val_it) can be specified when calling this function via the validation_data argument. The steps_per_epoch argument must be specified for the training iterator in order to define how many batches of images defines a single epoch.
之所以要确定
steps_per_epoch是因为这个ImageDataGenerator还可以有Augmentation(Resampling)的作用,因此有1000张图可以实际创造更多的训练集。假设一个Batch有50张图,那么其实可以有steps_per_epoch=40,平均每张图用2次。
Similarly, if a validation iterator is applied, then the validation_steps argument must also be specified to indicate the number of batches in the validation dataset defining one epoch.
# define model
model = ...
# fit model
model.fit_generator(train_it, steps_per_epoch=16, validation_data=val_it,
validation_steps=8)
Once the model is fit, it can be evaluated on a test dataset using the evaluate_generator() function and passing in the test iterator (test_it). The steps argument defines the number of batches of samples to step through when evaluating the model before stopping.
# evaluate model
loss = model.evaluate_generator(test_it, steps=24)
Finally, if you want to use your fit model for making predictions on a very large dataset, you can create an iterator for that dataset as well (e.g. predict_it) and call the predict_generator() function on the model.
# make a prediction
yhat = model.predict_generator(predict_it, steps=24)
image = Image.open("bird.jpg")
image.size
image.format
image.mode
display(image)
The Keras deep learning library provides the ability to use data augmentation automatically when training a model.
First, the class must be instantiated and the configuration for the types of data augmentation are specified by arguments to the class constructor. A range of techniques are supported, as well as pixel scaling methods.
width_shift_range and height_shift_range arguments. horizontal_flip and vertical_flip arguments.rotation_range argumentbrightness_range argument.zoom_range argument.The width_shift_range and height_shift_range arguments to the ImageDataGenerator constructor control the amount of horizontal and vertical shift respectively. These arguments can specify a floating point value that indicates the percentage (between 0 and 1) of the width or height of the image to shift. Alternately, a number of pixels can be specified to shift the image.
Specifically, a value in the range between left maximum shift and the right maximum shift will be sampled for each image and the shift performed, e.g. [-0.5, 0.5].
如果只输入一个数,那么就是从0到该数的范围内随机移动。
# load the image
img = load_img('bird.jpg')
# convert to numpy array
data = img_to_array(img)
# expand dimension to one sample
samples = np.expand_dims(data, 0)
# create image data augmentation generator
datagen = ImageDataGenerator(width_shift_range=[-200,200])
# prepare iterator
it = datagen.flow(samples, batch_size=1)
# generate samples and plot
plt.figure(figsize=(10,10))
for i in range(9):
# define subplot
plt.subplot(330 + 1 + i)
# generate batch of images
batch = it.next()
# convert to unsigned integers for viewing
image = batch[0].astype('uint8')
# plot raw pixel data
plt.imshow(image)
# show the figure
plt.show()
We can see in the plot of the result that a range of different randomly selected positive and negative horizontal shifts was performed and the pixel values at the edge of the image are duplicated to fill in the empty part of the image created by the shift.
Below is the same example updated to perform vertical shifts of the image via the height shift range argument, in this case specifying the percentage of the image to shift as 0.5 the height of the image.
# load the image
img = load_img('bird.jpg')
# convert to numpy array
data = img_to_array(img)
# expand dimension to one sample
samples = np.expand_dims(data, 0)
# create image data augmentation generator
datagen = ImageDataGenerator(height_shift_range=0.5)
# prepare iterator
it = datagen.flow(samples, batch_size=1)
# generate samples and plot
plt.figure(figsize=(10,10))
for i in range(9):
# define subplot
plt.subplot(330 + 1 + i)
# generate batch of images
batch = it.next()
# convert to unsigned integers for viewing
image = batch[0].astype('uint8')
# plot raw pixel data
plt.imshow(image)
# show the figure
plt.show()
Note that other fill modes can be specified via fill_mode argument.
The flip augmentation is specified by a boolean horizontal_flip or vertical_flip argument to the ImageDataGenerator class constructor.
# load the image
img = load_img('bird.jpg')
# convert to numpy array
data = img_to_array(img)
# expand dimension to one sample
samples = np.expand_dims(data, 0)
# create image data augmentation generator
datagen = ImageDataGenerator(horizontal_flip=True)
# prepare iterator
it = datagen.flow(samples, batch_size=1)
# generate samples and plot
plt.figure(figsize=(10,10))
for i in range(9):
# define subplot
plt.subplot(330 + 1 + i)
# generate batch of images
batch = it.next()
# convert to unsigned integers for viewing
image = batch[0].astype('uint8')
# plot raw pixel data
plt.imshow(image)
# show the figure
plt.show()
A rotation augmentation randomly rotates the image clockwise by a given number of degrees from 0 to 360. The rotation will likely rotate pixels out of the image frame and leave areas of the frame with no pixel data that must be filled in. The example below demonstrates random rotations via the rotation range argument, with rotations to the image between 0 and 90 degrees.
# load the image
img = load_img('bird.jpg')
# convert to numpy array
data = img_to_array(img)
# expand dimension to one sample
samples = np.expand_dims(data, 0)
# create image data augmentation generator
datagen = ImageDataGenerator(rotation_range=90)
# prepare iterator
it = datagen.flow(samples, batch_size=1)
# generate samples and plot
plt.figure(figsize=(10,10))
for i in range(9):
# define subplot
plt.subplot(330 + 1 + i)
# generate batch of images
batch = it.next()
# convert to unsigned integers for viewing
image = batch[0].astype('uint8')
# plot raw pixel data
plt.imshow(image)
# show the figure
plt.show()
The brightness of the image can be augmented by either randomly darkening images, brightening images, or both. The intent is to allow a model to generalize across images trained on different lighting levels. This can be achieved by specifying the brightness_range argument to the ImageDataGenerator() constructor that specifies min and max range as a float representing a percentage for selecting a darkening or brightening amount.
Values less than 1.0 darken the image, whereas values larger than 1.0 brighten the image.
# load the image
img = load_img('bird.jpg')
# convert to numpy array
data = img_to_array(img)
# expand dimension to one sample
samples = np.expand_dims(data, 0)
# create image data augmentation generator
datagen = ImageDataGenerator(brightness_range=[0.5,1.5])
# prepare iterator
it = datagen.flow(samples, batch_size=1)
# generate samples and plot
plt.figure(figsize=(10,10))
for i in range(9):
# define subplot
plt.subplot(330 + 1 + i)
# generate batch of images
batch = it.next()
# convert to unsigned integers for viewing
image = batch[0].astype('uint8')
# plot raw pixel data
plt.imshow(image)
# show the figure
plt.show()
A zoom augmentation randomly zooms the image and either adds new pixel values around the image or interpolates pixel values respectively. Image zooming can be configured by the zoom_range argument to the ImageDataGenerator constructor.
You can specify the percentage of the zoom as a single float or a range as an array or tuple.
If a float is specified, then the range for the zoom will be
[1-value, 1+value].
# load the image
img = load_img('bird.jpg')
# convert to numpy array
data = img_to_array(img)
# expand dimension to one sample
samples = np.expand_dims(data, 0)
# create image data augmentation generator
datagen = ImageDataGenerator(zoom_range=[0.5,1.5])
# prepare iterator
it = datagen.flow(samples, batch_size=1)
# generate samples and plot
plt.figure(figsize=(10,10))
for i in range(9):
# define subplot
plt.subplot(330 + 1 + i)
# generate batch of images
batch = it.next()
# convert to unsigned integers for viewing
image = batch[0].astype('uint8')
# plot raw pixel data
plt.imshow(image)
# show the figure
plt.show()
# load the image
img = load_img('bird.jpg')
# convert to numpy array
data = img_to_array(img)
# expand dimension to one sample
samples = np.expand_dims(data, 0)
# create image data augmentation generator
datagen = ImageDataGenerator(zoom_range=[0.5,1.5],
brightness_range=[0.5,1.5],
rotation_range=90,
horizontal_flip=True,
vertical_flip=True,
width_shift_range=[-200,200],
height_shift_range=0.5)
# prepare iterator
it = datagen.flow(samples, batch_size=1)
# generate samples and plot
plt.figure(figsize=(10,10))
for i in range(9):
# define subplot
plt.subplot(330 + 1 + i)
# generate batch of images
batch = it.next()
# convert to unsigned integers for viewing
image = batch[0].astype('uint8')
# plot raw pixel data
plt.imshow(image)
# show the figure
plt.show()